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A reinactment 

Student: I have found it I A significant relationship between sex 
and political party affiliation. Look at this 2x2 contingency table 
and the chx-square value is significant at the .01 level. 

Table 1. Frequencies of political party affiliation by sex 
of respondent. 

Democrat Republican 
Male 60 40 ^ 

Female 40 60 

Faculty Member (while scratching on a pad of paper and punching on 
his calculator): But are you sure that you have found a meaningful rela- 
tionship? 

S: What do you mean? It*s significant and chi-squares with one degree 
of freedom seldom get beyond 6. 

F (while fumbling through a file cabinet to get a scatter diagram): 
But ^hat if I show you a bivariate scatterplot illustrating the strength 
of the relationship you have shown. (See Figure 1.) 

S (crestfallen): But there is no relationship there. 

F: Yes there is. A significant one at the .05 level of significance, 
for the 200 points, r = .2. 

S: But that means only 4% of the variance is accounted for, 96% is 
unexplained. 

F: Yes, and that is the strength of the relationship you have found 
with the chi-square analysis. 

(The scene continues and ends with the student and faculty member com- 
miserating at a local hangouts 
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Figure 1. A scatterdiagram variables X and Y where = = 50, 



o = o = 10, N = 200, r = .2 
X y 
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The purpose of this paper is to emphasize the interpretation of R 
in multiple regression that carries over to chi-square contingency table 
analysis. 

2x2 Contingency Tables 

Multiple regression aficionados would not have found themselves in 

the role of the student in the above scenario. By "dummy coding" sex and 

2 

political party affiliation, and regression one of the other, an R (actually 

2 

r^) value is obtained which is numerically equal to x /N, where N is the 
number of subjects on which the two variables are measured (McNeil, Kelly, 
McNeil, 1975, pp. 246-248). In general, if row variable A has two levels - , 
and column variable B has two levels, let 



ri if observation is from level 1 of A 
10 otherwise (l) 

1 if observation is from level 1 of B 



Y - 

LO otherwise 

then = x^/N. (See proof in Bishop, Fienberg, and Holland, 1975, p. 382), 

' xy 

Another related statistic is the phi (-J)) coefficient developed by Karl 
Pearson. If a, b, c, and d denote cell frequencies as indicated by the 
table at the left, (j) can be computed directly using the formula on the 
right . 

B 

1 2 

be - ad 



A ^ 



■I- 



'/(a+b) (c+d) (a+c) (b+d) 



But the formula for (J> can be derived mathematically from the formula for 
the Pearson product-moment correlation coefficient. (See Glass and Stanley, 
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1970, pp. 158-160.) So we have 

UnUke the bivariate scatterplot of two continuously distributed 

variables as in Figure 1, the plot of X and Y in (1) does not show much. 

But the interpretation of r^ (the coefficient of determination) as the 

proportion of variance in one variable explained by variation in the 

other holds for the categorical as well as the continuous case. (With 

a computer package like SAS (Barr, et al, 1976), it is easy to demonstrate 

this by calculating and printing predicted and residual scores, and com- 

2 

puting their variances and a^.) 

R X 2 Contingency Table 

If the row variable has more than two categories, an R x 2 contingency 
table can be constructed, and the coding method in (1) can be extended. 
Code Y as before, and extend the coding of X as follows: 

1 if observation is from level 1 of A 



_ r 1 if observi 
^1 ~ (,0 otherwise 

CI if observation is from level 2 of A 
^2 ~ <-0 otherwise 

observation is from level R-1 of A 
,2 ... 2 



(2) 



CI if....... 

r 10 otherwise 



Regression Y on X^ , X^ , • • • , X ^ yields and R which equals x /N, where 

is the test statistic for independence. Again, we can use our notions 
or R^ to add meaning to the relationship between A and B tested using the 



2 

value of X • 
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A modification of the phi coefficient is made for contingency 



tables larger than 2x2: Cramer's V is given by 

^ Lmin {(R-1), (C-l)}j . 



(The denominator is the maximum that V attains, so that V ranges from 

.2 



0 when no relationship is presented to a value of 1. ) Substituting R 

2 2 
for 4> and solving for R gives 

R^ = X min {(R-1), (C-l)} O) 
2 

So from Cramer's V or the x value, we can compute a proportion of vari- 
ance accounted for by the other. 

R X C Contingency Table 

The most general form of the contingency table has R rows for variable 
A and C columns for variable B. Variable A may be coded as in (2). 
But, the coding of Y needs to be an orthogonal partition x of the vari- 
ability in B. This is not difficult, using orthogonal polynominals , pro- 
viding the frequencies in each level of A are equal. Assuming variable B 

to have four levels, code Y as follows: 

r 1 if observation is from level 1 of B 
Y = <-l if observation is from level 2 of B 
^ L 0 otherwise 



r 1 if observation is from level 1 or 2 of B 

y = < -2 if observation is from level 3 of B 
L 0 otherwise 

^ r 1 if observation is from levels 1, 2, or 3 of B 
^3 1-3 otherwise 

Then regress each Y^ (i = 1, 2, . . . , C-l) on , X^, , denoting 

the respective multiple correlation coefficients by R^. 
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Then 




is equal to /U from the R x C table. Also, Cramer's V computed on the 

2 2 

table using equaltion (3) yields and R equal to x /N- 

It was hoped to related R^ to Wilk's Lambda since multiple regression 
and multivariate analysis of variance have so much in common. (Kerlinger 
and Pedhazur, 1973, pp. 353 ff). But while apparently related, the exact 
connection could not be found by this author. 

Conclusion 

The purpose of this paper was to relate common statistics from con- 

2 

tingency table analysis to the more familiar R terminology in order to 

better understand the strength of the relation implied- The method of 

2 2 
coding contingency tables in order to compute R was shown and how R 

2 

relates to ({>, V, and x . It is not implied that all contingency tables 
be recoded so that multiple regression can be performed, but it is hoped 
that proportion of variance interpretations be done in addition to tests 
of significance. 
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